The prescription opioids use is popular be using in some specific area, such as cancer therapy, and pain relief. From 1999 to 2016, more than 630000 people have died from a drug overdose. As of 2016, 2.1 million Americans have an opioid use disorder, and more than 63600 people died from drug overdoses, which makes it the leading cause of injury-related death in the United States (CDC, 2018). I want to use the statistics to dig into the related data and come up with the idea to help end this crisis. Since the different states have different limitation and policy for the opioid use, I try to connect the opioid cause death rate data with all the county in the United States. From that, I can use multiple algorithms to dealing the data, including data visualization to show the distribution for the affection.
After gathering raw data, I first process the data that emerge different columns into one general data set, including the number of different races of people, different genders, income levels etc. in different counties. After sorting the data by different counties, I try to find out how different factors are related to the death number within Opioid death rate in different counties in each year. I will conduct a one-way ANOVA analysis in order to find out how different the races relating with the death numbers due to Opioid Crisis. As reports, from 2013 to 2016 is the most increasing year for the Opioid crisis, I'm interesting to find out the relations between each of them and the yearly change for the opioid crisis. The result be presented by showing a map of America, categorized by different county. Furthermore, I will use different colors to represent how close different factors are related to the death rate with linear regression plots. After showing the connecting of different factors to the death rates, I sum up the report by concluding the most influential factor to Opioid Crisis and which year was most effected by opioid using. By obtaining such conclusion, people will be more aware of such factor, hence will help prevent Opioid Crisis or reduce it in general.
As one of the county level project, this study contains seven dataset, which might hard to follow by each varible. Luckly, only one or two variables in each dataset will be using in this study.
The following cell is using the cleaning process including split counties & States name into two dataset, create new varible, and drop unuse variables.
That is a lot of data
Luckly we only pick only few piece from each of them.
Here are some of the variables retrieved:
Bese on some other research, 2017 is the worst year ever for drug overdose deaths in Americathe, and the high increasing time for Opioid Crisis is from 2013 to 2016. Therefore, in here we choice all the dataset and varible from 2013 to 2016.
In order to show the different between each factor, this study seperate dataset base on years. which each year contains County names, Opioid Death Rate for each year, State names, Prescribing Rate in each year, Cancer Death Rate in each year, and Proverty percent in each counties for each year.
In order to maping the exactly factor feature, I choice to keep the missing value at this point, and will drop the NaN latter when doing regression.
Merge the geometry information with dataset for each year to mapping the visualazation plot on the map.
Here we use 2013 GeoDataframe as an example
The table above show the missing value for each factors in the table. the merge causes a lot of missing value since some counties did not report any factors information, which shows as NaN, I did not drop or replace the missing value since it will cause the visible issue when the plot, which hard to find the distribution on the map. ( If I replace NaN by 0, the whole missing value would be plot as some light color close to what I want to see since the rate is mostly at a low level close to 0, the 0 joins would make it hard to distinguish.) So I'd rather keep the missing value as empty space.
the plots below are the distribution map in county level for four variables from 2013 to 2016
The plot above shows the Opioid Death rate, the cancer death rate, the prescribing rate and the poverty percent in 2013 to 2016.
What we can find out from the plot is most Opioid death happened in the counties with the high cancer death rate, high prescribing rate, and high poverty percent. ( the deep colors all located at approximately the same area), we can find out the opioid death are mostly on the east coast and someplace on the west coast. This distribution is the same for all plots above in each year.
In order to find out the change during years change, we can make a plot conduct all the plot above, using year as x-axis and four factors as y-axis.
This is amazing! From above, we can analyze the factor for yearly change just by the plot of map.
From the color change from light to deep, we can notice the rates are keeping increasing from 2013 to 2016. Most death are happend in the east coast with large veriation with mostly spread out. But we can see from the plot, the most density area located in northeast, which are states like New York, Massachusetts, and New Jersey.
Also, from the graph, it's not to find out that the place with higher proverty rate share the higher prescribing rate, which is also related with both death rate.
From the plot, it is hard to find out the detials for the change or the relationship between, So I'm using ANOVA and regression to dig out the relationship between them.
This is the table that shows the output of the ANOVA analysis and whether there is a statistically significant difference between rece groups. We can see that the significance value is 6.933151e-10 (i.e., p = 6.933151e-10), which is below 0.05. and, therefore, there is a statistically significant difference in the Opioid death rate between the different race groups. This is great to know, but we do not know which of the specific groups differed. Luckily, we can find this out in the Multiple Comparisons table which contains the results of the race group regression.
From the OLS regression table, we can find out that the coeficient for Asian or Pacific Islander, Black or African American, and White, whcih shows Asian have the lowest coefficient, and the white have the highest. As the result, we can speculate most people suffer from Opioid crisis are white people, and Black communite are follow behind, then will be other groups like Asian and Pacific islander.
Form the previous mapping, we find out the Opioid death rates are possiblly related with factors such as Prescribing rate in each county, the Proverty rate, and the cancer death rate.
In order to find out the regression, I first dealing with the missing value in my data, I use interpolate() as my fitter to generate the missing data in my datasets.
Then, Plot the scatter plot to show the relations.
The plot above shows the relationship between opioid death rate and prescribing rate in each year from 2013 to 2016, we can find out that the prescribing rate is suddently increasing in 2015 (spread out in y), and as the result, the opioid death rate is increasing in teh 2016 (spread out in x).
We can ignore the density number like the bar in the graph, that's NaN generate provide the fitting value for NaN.
Same as before, ignore the bar shape data as NaN
We can see the Opoiod death rate related to cancer death rate and opioid death rate relationship from above. 2013 to 2016, the data spread out a lot compared with the beginning year in 2013 (the shape change from a bar shape in 2013 to round shape in 2016, which we can see the relationship between this two variable is more and more related. The more spread out data shows the approximately the linear regression between the two variables.
To prove what we have from the plot above, we do the OLS linear regression to see if the result is right.
First we do the ols between Opioid Death Rate and Prescribing Rate for each year.
From above, we can see the relationship between Opioid death rate and Prescribing Rate in 2013; the coef is -0.002 which is the negative relationship with a negative slope. The increasing of the prescribing rate would cause decreasing of the opioid death rate.
this means, 2013 looks not too bad for us, at least the prescription could save people from suffering.
From above, we can see the relationship between Opioid death rate and Prescribing Rate in 2014; the coef is 0.0012 which is the positive relationship with a positive slope. The increasing of the prescribing rate would cause increasing of the opioid death rate.
From above, we can see the relationship between Opioid death rate and Prescribing Rate in 2015, the coef is 1.105e-05 which is the positive relationship with positive slope. the increasing of the prescribing rate would cause increasing of the opioid death rate.
As one of the most horrible year for opioid crisis, this result is not suprise. From 2014 to 2015, the prescribtion of Opioid medicine affect the opioid crisis a lot.
From above, we can see the relationship between Opioid death rate and Prescribing Rate in 2016, and the coef is -0.0007 which is the negative relationship with a negative slope. The increasing of the prescribing rate would cause decreasing of the opioid death rate.
We can see from the result, the coef change to negative, which might be relative with the limitation of prescribing limitation assign by the government in 2016, it works, which bring the problem back to normal.
In order to show the relationship strightly, we make the graph below the show the relationship.
As the result showing above, the relationship is fit for what we get from the table, the coef is decreaing in 2013 and 2016 and increasing in 2014 to 2015. We can find out that the 2014 is increasing a lot which is the year mostly affected by the prescribing.
same as above plot, we can also get the result for cancer death rates
From above show the reationship between opioid death rate and cancer death rate. which we can find out the 2014 share the high cancer death rate than other years.
The plot above show the opioid death rate related with all ages in prverty.
In order to show all other factors which might effect the opioid death rate, we do the linear regression for all factors.
The table below shows the linear regression between each factor, we can find out that all the coef are positive, which we can infer that the 2014 was a big year for opioid using, and cancer death rate was the most effective element for opioid, which have coef as 4.4823 which is higner than other factors.
As a result, we can see the opioid cause death rate with cancer death rate is mostly correlated, and 2014 have the most massive increasing rate in the opioid cause of death.
Prescription opioids can be used to treat moderate-to-severe pain and are often prescribed following surgery or injury, or for health conditions such as cancer. In recent years, there has been a dramatic increase in the acceptance and use of prescription opioids for the treatment of chronic, non-cancer pain, such as back pain or osteoarthritis, despite serious risks and the lack of evidence about their long-term effectiveness. As one of the most effective elements for the opioid cause of death, the government should control the prescription opioids with strict law to decreasing the opioid using from the beginning.
But sometimes the prescription opioid could be helpful, especially on the cancer treatment, that's the reason why the cancer death rate is mostly related to opioid cause death rate. As the way to figure this, I suggest people using some alternative medicines which might not as good as opioid medicines, but without addiction.
Surprisingly, I find the relation between poverty rate and opioid was changed during the years, the higher poverty rate cause the high opioid death rate in 2013, but from 2014, the higher poverty rate cause the decreasing of the opioid death rate. I did some search on the website, which the result show the pricing of opioid medicines was increasing serval times during 2013 to 2015, and that might the reason cause the poor people cannot afford the medication for cancer or opioid using which produce the high increase in both death rate. To solve this, I think the government needs to provide some law to limit opioid medicine pricing change. I know some medicine company produces opioid because the huge benefit behind it, with the regulation or restriction, the balance between profit and manufacturing would be a break so that the opioid could be replaced by something else which more useful for treatment.
The weaknesses of my model can be overcome. The important weakness of low sample size after merge can be remedied by waiting until more data becomes available as time passes.
The method in real life problem always have a lot of missing value, this mtheod was not doing good work on that which replace the missing value by some basic method such as mean and median. In a good way to solve this, I can apply linear regression between my variables, or using K-NN methed to generate the most fittble numbers for the missing value, which will lower the varience of the data.
Our analysis also fails to include the fact that the month or day likely has a strong effect on opioid cause of death, so implementing some sort of time series analysis would likely give me greater power in detecting differences in our variables of interest.
Other ways to improve the model include the use of cross-validation to see if a polynomial term of some variables could provide more predictive power without overfitting.
Other areas of interest for analysis would be to make comparisons between some big cities such as NYC or Miami city instead of just doing an analysis of the counties.
[1] Center of Disease Control and Prevention. (2018, December 5).Multiple Cause of Death, 1999-2017 Request. Retrieved December 5, 2018, from https://wonder.cdc.gov/mcd.html
[2] Center of Disease Control and Prevention. (2018, December 5).Prescription Opioid Data. Retrieved December 5, 2018, from https://www.cdc.gov/drugoverdose/data/prescribing.html
[3] Center of Disease Control and Prevention. (2018, December 5).Cancer Data and Statistics. Retrieved December 5, 2018, from https://www.cdc.gov/cancer/dcpc/data/index.html
[4] U.S. Census Bureau. (2018, December 5).Income and Poverty in the United States (2013-2016) Retrieved December 5, 2018, from https://www.census.gov/data/tables/2017/demo/income-poverty/p60-259.html
[5] U.S. Census Bureau. (2018, December 5).Cartographic Boundary Shapefiles - States. Retrieved December 5, 2018, from https://www.census.gov/geo/maps-data/data/cbf/cbf_state.html
[6] U.S. Census Bureau. (2018, December 5).Cartographic Boundary Shapefiles - Counties. Retrieved December 5, 2018, from https://www.census.gov/geo/maps-data/data/cbf/cbf_counties.html